TECSniffTextEncoding
Sniffs a text stream of unknown encoding, based on an array of possible encodings, and returns the probable encodings in a ranked list.
pascal OSStatus TECSniffTextEncoding ( TECSnifferObjectRef encodingSniffer, TextPtr inputBuffer, ByteCount inputBufferLength, TextEncoding testEncodings[], ItemCount numTextEncodings, ItemCount numErrsArray[], ItemCount maxErrs, ItemCount numFeaturesArray[], ItemCount maxFeatures);
encodingSniffer
- A pointer to a sniffer object.
inputBuffer
- The text to be sniffed.
inputBufferLength
- The length of the input buffer.
testEncodings[]
- An array of text encoding specifications. On input, you must specify which text encodings you want to sniff for. On output, this array contains the input array rearranged in the order of most likely to least likely text encodings.
numTextEncodings
- A value of type
ItemCount
. This value refers to the number of entries in thetestEncodings[]
parameter.numErrsArray[]
- An array of type
ItemCount
. This array must contain at leastnumTextEncodings
elements. On return,numErrsArray
holds the number of errors found for each possible text encoding. The entries are in the same order as the entries in thetestEncodings[]
parameter at output.maxErrs
- The maximum number of errors allowed for a sniffer. The sniffer stops sniffing an encoding after this number is reached when creating the
numErrsArray
list.numFeaturesArray[]
- An array of type
ItemCount
. This array must contain at leastnumTextEncodings
elements. On return, thenumFeaturesArray[]
parameter holds the number of features found for each possible text encoding. The entries are in the same order as the entries in thetestEncodings[]
parameter at output.maxFeatures
- The maximum number of features allowed for a sniffer. The sniffer stops sniffing an encoding after this number is reached when creating the
numFeaturesArray
list.- function result
- A result code. See "Text Encoding Conversion Manager Result Codes" (page 42) for a list of possible values. If this function returns a result code other than
noErr
, then one of the conversion plug-ins accessed by the converter encountered an error condition while accessing a sniffer function.DISCUSSION
For a specified stream of bytes in an unknown encoding and an array of possible encodings,TECSniffTextEncoding
returns counts of "errors" and "features" for each of the encodings. Each error indicates a code point or sequence that is illegal in the specified encoding, and a feature indicates the presence of a sequence that is characteristic of that encoding. Table 3-1 shows sample output from a sniffer run.
Sample Sniffer Output Encoding Errors Features EUC 0 8 JIS 0 0 Mac OS Japanese 20 20 For example, the byte sequence which is interpreted in Mac OS Roman as "äøéö" could legally be interpreted either as Mac OS Roman text or as Mac OS Japanese text. Both sniffers would return zero errors, but the Mac OS Japanese sniffer would also return two features of Mac OS Japanese (representing two legal 2-byte characters.)
The arrays are returned in a ranked list with the most likely text encodings first. The results are sorted first by number of errors (fewest to most), then by number of features (most to fewest), and then by the original order in the list. Upon return from the function, you can assume the correct encoding is in
testEncodings[0]
, or possiblytestEncodings[1]
.If any of the available encodings are not examined, their number of errors and number of features are set to 0xFFFFFFFF, and they sort to the end of the list.
SEE ALSO
The functionTECCountAvailableSniffers
(page 83)The function
TECGetAvailableSniffers
(page 84)The function
TECCreateSniffer
(page 85)